The transition to renewable energy is a critical issue for mitigating climate change and promoting sustainable development. As the world’s second-largest carbon emitter (https://www.epa.gov/ghgemissions/global-greenhouse-gas-emissions-data), the United States has set ambitious targets for reducing greenhouse gas emissions and increasing renewable energy generation. However, the factors that drive renewable energy adoption in the US are not fully understood, and there is a need for more research to identify the most effective policies and incentives for promoting clean energy.
In this project, we aim to investigate the key drivers of renewable energy adoption in the US and their relative importance. We will use a comprehensive dataset derived from public-facing websites which includes information on state-level renewable energy capacity, energy policies, economic indicators, and demographic variables.
Our research questions include: 1. How do economic indicators, such as energy prices, GDP, and oil production, influence renewable energy adoption in different regions of the US? 2. What role do demographic variables, such as IQ and political affiliation, play in shaping attitudes towards renewable energy and driving adoption?
knitr::opts_chunk$set(echo = TRUE)
#1
#Install familiar packages
library(tidyverse);library(lubridate);library(viridis);library(here)
#install.packages("rvest")
library(rvest)
#install.packages("dataRetrieval")
library(dataRetrieval)
#install.packages("tidycensus")
library(tidycensus)
library(dplyr)
library(rvest)
library(ggplot2)
#install.packages("corrplot")
library(corrplot)
library(stringr)
library(sf)
library(leaflet)
library(mapview); mapviewOptions(fgb = FALSE)
sf::sf_use_s2(FALSE)
The dataset for this analysis was collected from various sources including the National Renewable Energy Laboratory (NREL), the United States Energy Information Administration (EIA), and wisevoter.com. The dataset includes information on various factors that may drive renewable energy adoption in the US, such as state-level policies, economic indicators, demographic characteristics, and natural resources.
The dataset was first imported into R as separate data frames and then merged into a single data frame using the merge() function. Before merging, data cleaning and manipulation were performed to ensure consistency across variables and eliminate missing values. For instance, some variables were renamed, and missing values were replaced with appropriate values or removed entirely.
The final dataset contains 50 observations (one for each state) and 7 variables. The variables in the dataset are summarized in the table below:
| Variable | Unit | Range / Central Tendency | Data Source |
|---|---|---|---|
| Good_Air_Quality_Days_Percentage | Percentage | 35.8-89.4 | wisevoter.com |
| Oil_Production | Thousand barrels per day | 0-4700 | wisevoter.com |
| GDP | Billion dollars | 54.7-2,784.1 | wisevoter.com |
| Sunshine | Hours per day | 3.7-10.3 | wisevoter.com |
| Electricity_Cost | Cents per kilowatt hour | 5.6-20.5 | wisevoter.com |
| Average_IQ | IQ points | 94.8-103.9 | wisevoter.com |
| Republican_vote | Percentage | 23.5-70.4 | wisevoter.com |
url_aq <- "https://wisevoter.com/state-rankings/air-quality-by-state/"
page1 <- read_html(url_aq)
states <- page1 %>%
html_nodes("td:nth-child(2)") %>%
html_text()
good.air.days <- page1 %>%
html_nodes("td:nth-child(3)") %>%
html_text()
df_aq <- data.frame(
States = states,
Good_Air_Quality_Days_Percentage = good.air.days
)
url_oil <- "https://wisevoter.com/state-rankings/oil-production-by-state/"
page2 <- read_html(url_oil)
states2 <- page2 %>%
html_nodes("td:nth-child(2)") %>%
html_text()
oil.production <- page2 %>%
html_nodes("td:nth-child(3)") %>%
html_text()
df_oil <- data.frame(
States = states2,
Oil_Production = oil.production
)
url_gdp <- "https://wisevoter.com/state-rankings/gdp-by-state/"
page <- read_html(url_gdp)
states3 <- page %>%
html_nodes("td:nth-child(2)") %>%
html_text()
gdp <- page %>%
html_nodes("td:nth-child(3)") %>%
html_text()
df_gdp <- data.frame(
States = states3,
GDP = gdp
)
url_sunny <- "https://wisevoter.com/state-rankings/sunniest-states/"
page <- read_html(url_sunny)
states4 <- page %>%
html_nodes("td:nth-child(2)") %>%
html_text()
sunshine <- page %>%
html_nodes("td:nth-child(3)") %>%
html_text()
df_sunny <- data.frame(
States = states4,
Sunshine = sunshine
)
url_elec <- "https://wisevoter.com/state-rankings/electricity-cost-by-state/"
page <- read_html(url_elec)
states5 <- page %>%
html_nodes("td:nth-child(2)") %>%
html_text()
cost <- page %>%
html_nodes("td:nth-child(3)") %>%
html_text()
df_ecost <- data.frame(
States = states5,
Electricity_Cost = cost
)
url_iq <- "https://wisevoter.com/state-rankings/average-iq-by-state/"
page <- read_html(url_iq)
states6 <- page %>%
html_nodes("td:nth-child(2)") %>%
html_text()
iq <- page %>%
html_nodes("td:nth-child(3)") %>%
html_text()
df_iq <- data.frame(
States = states6,
Average_IQ = iq
)
url_party <- "https://wisevoter.com/state-rankings/red-and-blue-states"
page <- read_html(url_party)
states7 <- page %>%
html_nodes("td:nth-child(1)") %>%
html_text()
rep_vote <- page %>%
html_nodes("td:nth-child(3)") %>%
html_text()
df_party <- data.frame(
States = states7,
Republican_vote = rep_vote
)
# Merge data frames
state_rankings_df_merge <- Reduce(function(x, y) merge(x, y, by = "States", all = TRUE),
list(df_aq, df_oil, df_gdp, df_sunny, df_ecost, df_iq, df_party))
# View merged data frame
head(state_rankings_df_merge)
## States Good_Air_Quality_Days_Percentage Oil_Production GDP
## 1 Alabama 85.8% 425 Mbbl $257,465,000,000
## 2 Alaska 92.3% 159,623 Mbbl $57,983,000,000
## 3 Arizona 64.6% 6 Mbbl $429,819,000,000
## 4 Arkansas 83% 4,179 Mbbl $150,483,000,000
## 5 California 60.6% 130,593 Mbbl $3,513,348,000,000
## 6 Colorado 75.4% 144,621 Mbbl $440,903,000,000
## Sunshine Electricity_Cost Average_IQ Republican_vote
## 1 4,660 kJ/m² $0.1 per kWh 95.7 62%
## 2 <NA> $0.2 per kWh 99 52.8%
## 3 5,755 kJ/m² $0.11 per kWh 97.4 49.1%
## 4 4,724 kJ/m² $0.09 per kWh 97.5 62.4%
## 5 5,050 kJ/m² $0.2 per kWh 95.5 34.3%
## 6 4,960 kJ/m² $0.11 per kWh 101.6 41.9%
write.csv(state_rankings_df_merge, file = here("data_processed/state rankings_raw.csv"),
row.names = FALSE)
Next, we clean the data scraped from various websites before merging it with the power plant data. The dataframe we have right now consisted of columns made of chars. After removing the units (such as per kWh) and characters (such as comma and $), we converted the chars to numeric values and created additional columns as needed based on existing columns.
# read in raw data
state_ranking <- read.csv(here("data_processed/state rankings_raw.csv"), header = TRUE)
# save a copy of raw data before making changes
state_ranking_raw <- state_ranking
# remove % in good_air_q col
state_ranking$Good_Air_Quality_Days_Percentage <- str_sub(state_ranking$Good_Air_Quality_Days_Percentage, end = -2)
# change chr to numeric
state_ranking$Good_Air_Quality_Days_Percentage <- as.numeric(state_ranking$Good_Air_Quality_Days_Percentage)
# remove unit in oil_production
state_ranking$Oil_Production <- str_sub(state_ranking$Oil_Production, end = -5)
# remove comma
state_ranking$Oil_Production <- gsub(',','', state_ranking$Oil_Production)
# change chr to numeric
state_ranking$Oil_Production <- as.numeric(state_ranking$Oil_Production)
# remove $ in GDP
state_ranking$GDP <- str_sub(state_ranking$GDP, 2)
# remove comma
state_ranking$GDP <- gsub(',','', state_ranking$GDP)
# change to numeric
state_ranking$GDP <- as.numeric(state_ranking$GDP)
# remove unit in sunshine
state_ranking$Sunshine <- str_sub(state_ranking$Sunshine, end=-6)
# remove comma
state_ranking$Sunshine <- gsub(',','', state_ranking$Sunshine)
# change to numeric
state_ranking$Sunshine <- as.numeric(state_ranking$Sunshine)
# clean electricity_cost
state_ranking$Electricity_Cost <- str_sub(state_ranking$Electricity_Cost, end=-8)
state_ranking$Electricity_Cost <- str_sub(state_ranking$Electricity_Cost, 2)
state_ranking$Electricity_Cost <- as.numeric(state_ranking$Electricity_Cost)
# clean average
state_ranking$Average_IQ <- as.numeric(state_ranking$Average_IQ)
# clean republican_vote
state_ranking$Republican_vote <- str_sub(state_ranking$Republican_vote, end=-2)
state_ranking$Republican_vote <- as.numeric(state_ranking$Republican_vote)
# create new column: party affiliation
state_ranking$party <- with(state_ranking, ifelse(Republican_vote > 50, "Red", "Blue"))
For the response variable, clean energy adoption rate, data wrangling and calculation are needed. We first transform all NAs to 0 so that we can later aggregate the capacity for each energy source. The next step is to calculate the total clean energy capacity and total energy generation capacity for each power plant. After that, we aggregate all the power plants by states and calculate the clean energy percentage for each state.
Power_plant <- read.csv(here("data_raw/Power_Plants.csv"), stringsAsFactors = TRUE)
Power_plant[is.na(Power_plant)] <- 0
Power_plant <- Power_plant %>%
mutate(Total_clean = Bio_MW + Geo_MW + Hydro_MW + HydroPS_MW + Nuclear_MW + Solar_MW + Wind_MW) %>%
mutate(Total = Total_clean + Coal_MW + NG_MW + Crude_MW)
Clean_energy <- aggregate(cbind(Power_plant$Total_clean, Power_plant$Total), list(Power_plant$State), FUN = sum)
colnames(Clean_energy) <- c("States", "Clean_energy_MW", "Total_MW")
Clean_energy$Clean_percent <- Clean_energy$Clean_energy_MW/Clean_energy$Total_MW
We merge the two clean datasets as one final dataset that is ready for analysis.
final_df <- merge(Clean_energy, state_ranking)
We plot correlation plots to explore the covariance among response and explanatory variables.
final_naomit <- final_df %>%
select(Clean_percent:Republican_vote) %>%
na.omit()
final_corr <- cor(final_naomit)
corrplot.mixed(final_corr, upper = "ellipse", tl.cex = 0.6)
We want to take a deeper dive into the distribution of each variables by histograms. Based on the histograms, oil production, GDP and electricity cost are skewed, which indicates that transformation is needed for those variables.
par(mfrow=c(2,4))
hist(final_naomit$Clean_percent, xlab = "Clean Energy Adoption %", main = NULL)
hist(final_naomit$Good_Air_Quality_Days_Percentage, xlab = "Good Air Quality Days %", main = NULL)
hist(final_naomit$Oil_Production, xlab = "Good Air Quality Days %", main = NULL)
hist(final_naomit$GDP, xlab = "GDP", main = NULL)
hist(final_naomit$Sunshine, xlab = "Sunshine", main = NULL)
hist(final_naomit$Electricity_Cost, xlab = "Electricity Cost", main = NULL)
hist(final_naomit$Average_IQ, xlab = "Average IQ", main = NULL)
hist(final_naomit$Republican_vote, xlab = "Republican Vote", main = NULL)
Next, we visualize our power plants data by making an interactive map widget with the leaflet package. We first categorized the power plant as “clean energy” or “other” based on the “primary source” attribute - we considered biomass, geothermal, hydroelectric, pumped storage, nuclear, solar, and wind to be clean. For the map, we plotted clean energy power plants as blue and other as red; the size of the markers was scaled to indicate the maximum output and was expressed in megawatts in original data. You can pan around and zoom in and out to explore the dataset.
# read in the shapefile
powerp <- st_read('../LiuSmootZhang_ENV872_EDA_FinalProject/data_raw/Power_Plants.shp')
## Reading layer `Power_Plants' from data source
## `/home/guest/LiuSmootZhang_ENV872_EDA_FinalProject/data_raw/Power_Plants.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 11569 features and 32 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -171.7124 ymin: 17.94712 xmax: -65.27956 ymax: 71.292
## Geodetic CRS: WGS 84
powerp$PrimSource <- as.factor(powerp$PrimSource)
unique(powerp$PrimSource)
## [1] petroleum hydroelectric natural gas nuclear coal
## [6] pumped storage geothermal wind biomass batteries
## [11] other solar
## 12 Levels: batteries biomass coal geothermal hydroelectric ... wind
powerp$PrimSource <- str_trim(powerp$PrimSource, 'both')
# clean energy: biomass, geothermal, hydroelectric, pumped, nuclear, solar, wind
# create new column: clean energy
powerp$clean <- with(powerp,
ifelse(PrimSource=='biomass'|PrimSource=='geothermal'|PrimSource=='hydroelectric'|PrimSource=='pumpedstorage'|PrimSource=='nuclear'|PrimSource=='solar'|PrimSource=='wind', 'clean energy', 'others'))
powerp$clean <- as.factor(powerp$clean)
# colors for the map widget
pal <- colorFactor(c('skyblue','tomato'), powerp$clean)
# make the map widget
leaflet(powerp) %>%
addProviderTiles(providers$CartoDB.Positron)%>%
addCircleMarkers(
radius = ~sqrt(Total_MW)/5,
color = ~pal(clean),
stroke = FALSE, fillOpacity = 0.5
) %>%
addLegend("bottomright", pal = pal, values = ~clean,
title = "Power plants",
opacity = 1
)
We then mapped the clean energy percentage in each state. Thanks to the help from https://rstudio.github.io/leaflet/choropleths.html, we were able to make this interactive map widge. You can pan around, zoom, and hover your mouse over each state to see its clean enegery percentage and state name in a popup window. The state boundaries data came from https://www.sciencebase.gov/catalog/item/52c78623e4b060b9ebca5be5.
# read in the shapefile
states_shp <- st_read('../LiuSmootZhang_ENV872_EDA_FinalProject/data_raw/tl_2012_us_state.shp')
## Reading layer `tl_2012_us_state' from data source
## `/home/guest/LiuSmootZhang_ENV872_EDA_FinalProject/data_raw/tl_2012_us_state.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 56 features and 17 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -19951910 ymin: -1643352 xmax: 20021890 ymax: 11554790
## Projected CRS: Popular Visualisation CRS / Mercator
states_shp <- st_transform(states_shp, c=4326)
# join the dataframe with clean energy percentage by state to the state shapefile
join_states <- full_join(states_shp, final_df, by = c('NAME' = 'States'))
# visualize on map
bins <- c(NA, 0, 0.20, 0.40, 0.60, 0.80, 1.00)
pal2 <- colorBin("Greens", domain = join_states$Clean_percent, bins = bins)
labels <- sprintf(
"<strong>%s</strong><br/>%g ",
join_states$NAME, join_states$Clean_percent) %>% lapply(htmltools::HTML)
leaflet(join_states) %>%
setView(-96, 37.8, 3) %>%
addPolygons(
fillColor = ~pal2(Clean_percent),
weight = 2,
opacity = 1,
color = "white",
dashArray = "3",
fillOpacity = 0.7,
highlightOptions = highlightOptions(
weight = 5,
color = "#666",
dashArray = "",
fillOpacity = 0.7,
bringToFront = TRUE),
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto")) %>%
addLegend(pal = pal2, values = ~Clean_percent, opacity = 0.7, title = NULL,
position = "bottomright")